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AGENDA 

■ Script profiling & optimization 

■ New VM and how it affects you 

■ Mental model for performance 

■ Case study: OOP 



PERFORMANCE OVERVIEW 

Measure seven times, optimize once 







SCRIPT OPTIMIZATION 

■ Why? 


° Better user experience 

" Wider reach (platforms with slow CPUs) 

° Larger scale (200 player servers anyone?) 

■ Keep in mind: 

” Code must be correct before being fast 
- Balance performance and maintainability 


The rest of the talk assumes you want maximum performance 





SCRIPT OPTIMIZATION: PROCESS 

■ While (code is not fast enough) do 

” Measure script execution time 
° Identify hot spots using profiling tools 
" Change code 

” Measure script execution time again 



■ How do you know if code isn’t fast enough? 

■ How do you know how to change code to be faster? 




SCRIPT PROFILING: SCRIPT PERFORMANCE 


■ Available in Studio and Developer Console 

° Activity: How much of total available frame time does my script take? (100% = 33 ms at 30 Hz) 

° Rate: How often does my script run? (30/s = every frame at 30 Hz, or every other frame at 60 Hz) 
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StartButton Script 

1 
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SCRIPT PROFILING: MICROPROFILE 

■ Microprofile 

° How long does my script take to run each frame? 

° How much time do API calls inside my script take? 

■ debug.profilebegin/debug.profileend 

° How long do individual sections in my script take? 

■ Available on client (desktop & mobile) and server 
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LUAU VM 

TL;DR: Your Lua code is magically faster 







NEW VM: WHY? 


■ More and more focus on performance 

■ Scripts becoming a bottleneck for some games/apps 

■ Issues aren’t just at Lua-C boundary 

■ We decided to fix it by writing a new VM from scratch 




NEW VM: WHAT DID WE DO? 

■ New compiler 

■ New bytecode 

■ New fast interpreter 

■ New zero-overhead debugger 

■ Faster reflection layer for Lua-C API calls 


Faster garbage collector 





NEW VM: WHAT DID WE NOT DO? 


■ We didn’t change Lua syntax 

■ We didn’t change Lua semantics (hopefully) 

■ We didn’t rewrite standard library 


We didn’t implement JIT (yet) 




NEW VM: PERFORMANCE EXPECTATIONS 


■ Assuming your code uses few Roblox API calls, 

° Game code will run up to ~2-3x faster in client / server 

° Plugin code will run up to ~2-3x faster in Studio 

° Game code will run up to ~5-7x faster in Studio (zero-overhead debugger) 

■ We optimized Roblox API layer in both VMs: 

° Property access is now -30-50% faster in both VMs 
° Roblox API calls are now -50-70% faster in both VMs 

■ ... and in new VM, property/API access is additionally -20-30% faster 

■ If your code mostly called *expensive* Roblox APIs like raycasts, 
expect small performance increase 





NEW VM: PERFORMANCE NUMBERS 


Benchmark 

Speedup 

for k,v in pairs 

4.90 

for i,v in ipairs 

6.57 

OOP: constructor 

4.63 

OOP: method call 

1.96 

math.sqrt 

3.11 
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TerrainGenerator 
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Roact 
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NBody 

3.11 

Custom NPC 

1.68 

Game of Life 

2.66 
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NEW VM: ROLLOUT 

■ Live in Studio as an option 

° If you’re a beta user, it’s in the Beta Features list 

■ Live in Roblox mobile app code 

° Doesn’t affect you, but has 8 MB of real-world Lua code running well 

■ Live for several top games 

■ Test it and report bugs! 

■ We will completely switch to new VM this year. 





THINKING ABOUT PERFORMANCE 

How to tell, at a glance, how fast or slow code is? 






MENTAL MODEL: TIERS 


■ Predicting performance without measuring is valuable 

° Even if prediction is off. 

° Helps you write efficient code 
° Helps you identify potential hotspots 

■ If you know the cost of each line of your code, predicting is easy... 

■ ... but there are many different operations with different costs. 

■ Let’s split primitive operations into tiers! 




HERO: PRIMITIVES 


■ Operations that VM can do directly and quickly 

■ Local variable access 

■ Arithmetics: +, *, /, % on numbers (caveat: except for A ) 

■ String/array length: # (caveat: beware of arrays with holes) 

■ Built-in global access (computing math.sqrt) 

■ Array access via [ ] and numeric index 

■ Custom global access 

■ Lua object field access via . 





HERO: PRIMITIVES 

■ Isn’t table access slow? 

■ New VM has special optimizations for: 

o Built-in globals like math.sqrt - 
new VM feature, “imports” 

o Direct table field access like obj.field - 
new VM feature, “inline caching” 

o Custom globals like foo - 
also using inline caching 
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- before 

local sqrt = math.sqrt 
for i = 1 nbody do 
local bi = bodies[i] 

local bix, biy biz bimass bi x bi y bi z bi mass 
local bivx, bivy, bivz = bi.vx bi vy, bi.vz 
for j = i + 1, nbody do 
local bj = bodiesO] 

local dx, dy dz = bix - bj.x, biy - bj.y biz - bj.z 
local distance = sqrt(dx*dx + dy*dy + dz*dz) 

-- after 

for i = 1 nbody do 
local bi = bodies[i] 
for j = i + 1, nbody do 
local bj = bodies^'] 

local dx, dy, dz = bi.x - bj.x, bi y - bj.y, bi.z - bj.z 
local distance = math.sqrt(dx*dx + dy*dy + dz*dz) 




TIER 1: CALLS 


Operations that require calling a C/Lua function 

Calling a C function from the standard library (string:find etc.) 

Calling a Lua function 

Table access with custom Lua index/ newindex metamethod 


Property access and method calls for Roblox built-in types 
Property access and method calls for Roblox instances 






TIER 1: CALLS 


■ Different calls have somewhat different performance characteristics 

■ Some built-in C calls (like math.sqrt) will get faster in future releases 


Call type 

Time/call (Ins = 1ms per 1M calls) 

Built-in C (math.sqrt) 

20ns 

Simple Lua (function(x) return x*x end) 

24ns 

Roblox types (Vector3) 

43ns 

Roblox instances (Part) 

62ns 




TIER 2: ALLOCATIONS 

■ Operations that require creating new Lua/C objects 

■ Creating a table 

■ Creating a function 

■ Creating Roblox built-in objects (CFrame) 

Beware hidden allocations! 

° Creating a closure allocates once per unique upvalue captured! 

° Performing math on Vector3 or reading properties of Vector3 type allocates! 





TIER 2: ALLOCATION COST 


■ For each allocation, you pay multiple times. 

■ Once, when you make the allocation. 

■ Several times, when garbage collector looks at the object to make sure it’s necessary. 

■ Once, when the object is deallocated by garbage collector. 





TIER 3: INSTANCE MANIPULATION 


Operations that require changing instance hierarchy 

Setting property of unparented instance 
Creating a Roblox instance (Part) 

Setting property of a parented instance 


Setting Parent property 






CASE STUDY: OOP 






OOP: CLOSURES OR METATABLES? 

■ Lua doesn’t have classes! 

■ Everybody reinvents their own OOP 

■ Two major approaches: closures and metatables 




OOP: CLOSURES 


NPC - {} 

function NPC.new() 
local self = {} 

local maxhp = 200 
local hp = maxhp 

function self.Heal(deltahp) 
hp = math min(maxhp hp + deltahp) 
end 

function self GetHP() 
return hp 
end 

return self 



end 




OOP: CLOSURES - 

NPC - {} 

function NPC.new() 
local self = {} 

local maxhp = 200 
local hp = maxhp 

function self.Heal(deltahp) 
hp = math min(maxhp hp + deltahp) 
end 

function self GetHP() 
return hp 
end 

return self 


PERFORMANCE PROBLEMS 

g 

No direct access to object fields! 

■ GetHP() is costly 

ffl 

Creation is very expensive! 

■ 1 self {} 

■ 2 function objects 

■ 2 closed upvalues (maxhp & hp) 

■ ... = 5 allocations! 

H 

self, maxhp, hp allocated separately 

■ Poor memory locality 

■ Bad for CPUs 


end 




OOP: METATABLES 

NPC - {} 

function NPC.new() 
local self = {} 

self maxhp = 200 
self hp = self maxhp 

setmetatable(self, {_index = NPC}) 

return self 
end 

function NPC:Heal(deltahp) 
self hp = math min(self maxhp self.hp + deltahp) 

end 


rR> 




OOP: METATABLES - PERFORMANCE PROBLEMS 


NPC - {} 

function NPC.new() 
local self = {} 

self maxhp = 200 
self hp = self maxhp 

setmetatable(self, {_index = NPC}) 

return self 
end 

function NPC:Heal(deltahp) 
self hp = math min(self maxhp self.hp + deltahp) 

end 


Direct access to fields 
Fields allocated together 
@ Creation still expensive! 

Repeated field setup re-allocates tables 
New metatable created for each object 
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OOP: METATABLES - FIXED 


NPC - {} 

NPC _index = NPC 

function NPC.new() 
local maxhp = 200 

local self {maxhp = maxhp, hp = maxhp} 

setmetatable(self NPC) 

return self 
end 

function NPC:Heal(deltahp) 
self.hp = math.min(self.maxhp self.hp + deltahp) 

end 


Direct access to fields 
Fields allocated together 
Fast creation 

Field access doesn’t require_index 

Method access uses_index table 

- Special fast-path in VM 


local v = {} 
v X = 1 
v.Y = 2 

v.Z = 3 

Old VM: 570ns 
New VM: 330ns 


local v = { 

X = 1, 

Y = 2, 

Z = 3 

} 

Old VM: 340ns 
New VM: 120ns 




CONCLUSION 






DEBRIEFING 

■ Performance is important! 

■ Measure before, during and after optimization 

■ New VM is wonderful 

■ If you think our VM could be faster, share benchmarks 

■ If you aren’t sure why code is slow, ask on dev forum! 





FUTURE WORK 


We aren’t done yet! All of the following are under consideration 

- More performance improvements! 

- Optional static typing 

- Server-side JIT 

- Multithreaded Lua 

- First-class classes 





Questions? 







REFERENCES 

- Lua Performance Tips, by Roberto lerusalimschy (Lua author) 

■ https://www.lua.org/qems/sample.pdf 

- Roblox performance tips 

■ https://devforum.roblox.eom/t/psa-dont-use-instance-new-with-parent-araument/ 

■ https://devforum.roblox.eom/t/the-art-of-micro-optimizina 

- New VM beta 

■ https://devforum.roblox.eom/t/faster-lua-vm-studio-beta 

- Expect future performance articles on Developer Hub 


https://developer.roblox.com/en-us 





BONUS CASE STUDY: AABB 






local function fastComputeAABB model) 
local axes = {'X'.’Y'.'Z'} 

local rain = { math huge math huge math huge } 
local raax - { math.huge -math huge math huge } 
for _,obj in pairs model :3etDescendants()) do 

if obj :IsA("BasePart") then 
local cf = obj.CFrame 

local origin, up right, look = cf p cf upVector, cf.rightVector cf lookVector 
local halfSize = obj. Size/2 

local hx,hy,hz = halfSize X halfSize.Y, halfSize.Z 
for x = -1,1,2 do 

local worldR = right * (x * hx) 
for y = -1,1,2 do 
local worldU up * (y * hy) 
for z = -1,1,2 do 

local worldL = look * (z * hz) 

local vertex = (origin + worldR + worldU + worldL) 
for axisld = 1,3 do 

local coord = vertex[axes[axisld]] 
if coord < min[axisld] then 
min[axisld] = coord 
end 

if coord > max[axisld] then 
max[axisld] = coord 
end 
end 
end 
end 
end 
end 
end 

return Region3 .new(Vector3 .new( unpack min)), Vector3. new( unpack(nax))) 
end 

rR> 


■ For each part 

■ For each corner of bounding box 

■ Compute min & max 





local function fastComputeAABB model) 
local axes = {'X'.’Y'.'Z'} 

local rain = { math huge math huge math huge } 
local raax - { math.huge -math huge math huge } 
for _,obj in pairsimodel :GetDescendants()) do 
if obj :IsA("BasePart") then 
local cf = obj.CFrame 

local origin, up right, look = cf p cf upVector, cf rightVector cf lookVector 
local halfSize = obj. Size/2 

local hxhy.hz = halfSize X halfSize.Y, halfSize.Z 
for x = -1,1,2 do 

local worldR = right (x * hx) 
for y = -1,1,2 do 
local worldU up * (y * hy) 
for z = -1,1,2 do 

local worldL = look * (z * hz) 

local vertex = (origin + worldR + worldU + worldL) 
for axisld = 1,3 do 

local coord = vertex[axes[axisld]] 
if coord < min[axisld] then 
min[axisld] = coord 
end 

if coord > max[axisld] then 
max[axisld] = coord 
end 
end 
end 
end 
end 
end 
end 

return Region3 .new(Vector3 .new( unpack min)), Vector3 .new( unpack(max) )) 
end 

rR> 


Village template 
fastComputeAABB(workspace) 


Old VM: 68ms 
New VM: 51ms 




local function fastComputeAABB model) 
local axes = {'X' , 'Y , Z’} 

local m±n = { math huge, math huge math huge } 
local max = { -math huge math huge math huge } 

for _,obj in pairsi model GetDescendantsQ) do 
if ob j :IsA( "BasePart") then 
local cf = obi.CFrame 

local origin, up right, look = cf.p, cf.upVector, cf rightVector, cf lookVector 
local halfSize = obi Size /2 

local hxhy.hz = halfSize.X, halfSize.Y, halfSize.Z 
for x = -1,1,2 do 

local worldR right * (x * hx) 
for y = -1,1,2 do 

local worldU = up * (y * hy) 
for z = -1,1,2 do 

local worldL = look * ( z * hz) 
local vertex = (origin + worldR + worldU + worldL) 
for axisld = 1,3 do 
local coord = verte x[axes[axisldLL 
if coord < min[axisld] then 
min[axisld] = coord 
end 

if coord > max[axisld] then 
max[axisld] = coord 
end 
end 
end 
end 
end 
end 
end 

return Region3 . new( Vector3 . new( unpack)min) ) , Vectors newf unpack(max) )) 
end 


Tier 1: Calls 





AABB FROM OBB WITH COMPONENT-WISE ABS 

■ We’re taking 8 corners and transforming them to world space: 

° M * (center + (iextent.x, textent.y, textent.z)) 

■ We’re then taking min/max of each axis, but if we expand matrix transform, we get: 

° ±M00 * extent.x + ±M01 * extent.y + ±M02 * extent.z 

■ The maximum of this is reached for max (MOO, -MOO) = abs (MOO) 

■ We don’t need to transform 8 corners! 



Details here: https://zeux.io/2010/10/17/aabb-from-obb-with-component-wise-abs/ 




local function fasterComputeAABB modeli 

local minx miny, minz = math huge math huge math huge 
local maxx maxy maxz = -math huge -math huge math huge 

for _ obj in pairsimodel :GetDescendants()) do 
if obj :IsA("BasePart") then 
local cf = obj.OFrame 
local size = obj. Size 

local sx, sy, sz = sizeX size.Y, size.Z 

local x, y, z R00 R01 R02 R10, R11, R12, R20 R21, R22 = cf:components: ) 

— https://zeux.io/2010/10/17/aabb-from-obb-with-component-wise-abs/ 

local wsx = 0.5 * (math abs R00) * sx + math abs R01 * sy + math abs R02) * sz 

local wsy = 0.5 * (math abs (R10) * sx + math abs(RII) * sy + math abs: R12) * sz 

local wsz = 0.5 * (math abs R20) * sx + math abs (R21) * sy + math abs: R22) * sz 

-- currently math min math.max are slower 

if minx > x - wsx then minx = x - wsx end 
if miny > y wsy then miny = y - wsy end 
if minz > z - wsz then minz = z - wsz end 

if maxx < x + wsx then maxx = x + wsx end 

if maxy - y + wsy then maxy = y + wsy end 

if maxz < z + wsz then maxz = z + wsz end 

end 
end 

return Region3 .new(Vector3 . new (nunx miny minz), Vector3 .new (maxx maxy maxz 



Village template 

fastCo m p u te AAB B (wo rkspa 

0 

■ Old VM: 20ms (3.4x 
speedup) 

■ New VM: 12ms (4.2x 
speedup) 




local function fasterComputeAABB modeli 

local minx, rainy, minz = math.huge math huge math huge 
local raaxx maxy maxz = math huge math huge math huge 

for _,obj in pairsfmodel :GetDescendants() ) do 
if obj: IsA("BasePart") then 
local cf = obj.CFrame 
local size = obi. Size 

local sx, sy, sz = size X size.Y, size Z 

local x, y, z, R00 R01 R02 R10, R11, R12, R20 R21, R22 = cf components: ) 

— https://zeux.io/2010/10/17/aabb-from-obb-with-component-wise-abs/ 

local wsx = 0.5 * ;math abs R00) * sx + math.abs(RBI ) * sy + math.abs(RB2) * sz) 

local wsy = math abs R10 sx - math abs R11 sy - math abs R12 sz 

local wsz = math abs R20 sx - math abs R21 sy - math abs R22 * szi 

-- currently math rain math max are slower 
if minx > x - wsx then minx = x wsx end 

if rainy > y - wsy then rainy = y - wsy end 

if minz > z wsz then minz = z wsz end 

if maxx < x + wsx then raaxx = x + wsx end 

if maxy < y + wsy then raaxy = y + wsy end 

if maxz < z + wsz then maxz = z + wsz end 

end 
end 

return Region3 .new(Vector3 new (minx rainy, minz), Vector3 new (raaxx maxy maxz)) 
end 

rft 


Tier 1: Calls 

Manually computing math.abs brings 
new VM time down 12ms -> 10ms 

...having said that, math builtins 
will be faster in the future 




